Audio Chord Extraction Using a Probabilistic Model
نویسندگان
چکیده
This paper presents our submission to the MIREX 2008 Audio Chord Detection task. The front-end of our system incorporates a novel feature extractor which uses multiple pitch tracking techniques to extract for each frame a chroma profile that is more robust against chroma contributions not originating from fundamental frequencies but from harmonics thereof. The back-end of our system implements a probabilistic framework for the simultaneous recognition of chords and keys. The system works with probabilities and density functions derived from Lerdahl’s tonal distance metric and consequently, it needs no explicit training. 1 IMPLEMENTATION OVERVIEW Input wavefiles are converted to mono, resampled to 8 kHz and split into frames. The frame length is 150 ms and the hopsize is 20 ms. For each frame, the front-end calculates a chroma profile. Consecutive frames are grouped per 10 in so-called segments to improve the stability of the output and to speed up the calculation. The average chroma profiles of these segments are then supplied to the back-end. The back-end generates a chord label for each segment. This label represents one of four triads (major, minor, diminished and augmented) that can be defined for each of the 12 chromas. However, for the MIREX task only the major and minor triads were withheld and the diminished and augmented triads were mapped to a no-chord. The key output of the back-end has been discarded as well. The present implementation works offline, but it could be changed into a streambased system with little or no performance loss. It runs 96% real-time on an Intel Pentium M 1.86 GHz with 1GB of RAM. On average 13% of the time is spent on the resampling step, 22% on the front-end and 65% on the back-end. Lots of opportunities for speed up are available and have not yet been exploited. 2 THE FRONT-END OF THE SYSTEM As in many other systems, the acoustic observations are chroma profiles, but the calculation of these profiles differs from what is commonly used. In its simplest form, such a profile is just a log-frequency representation of the spectral content folded into a single octave. However, the problem with such a representation is that e.g. the third harmonic of a pitch folds into a chroma that is located at +7 or -5 semitones with repect to the fundamental, thus adding evidence to a second pitch class that is not necessarily present in the signal. Our front-end uses the novel implementation proposed by Varewyck et al [3]. It aims at maximally coupling the higher harmonics to their fundamental frequency by the application of multiple pitch tracking techniques. Ideally, if that coupling were perfect, the chroma profile would only represent notes that are actually played, and the chord detection would mainly be a matter of pattern matching. The values of the chroma profile are scaled such that they add up to 1, making them insensitive to the intensity of the sound. Fundamental frequencies lower than 100 Hz are considered to be bass-notes and are not allowed to contribute to the profile. Although such bass-notes could make a significant addition to the chord, mostly they just repeat a note from the higher registers or they do not contribute to the chord (e.g. a walking bass), and therefore we argue that it does more harm than good to include them. A consequence of using a pitch tracker for chroma profile generation is that if no frequency is supported as a fundamental frequency by the presence of higher harmonics, the chroma profile will be a vector of zeros. At the moment, such a profile does not yet cause the back-end to generate a no-chord, but this is one of the planned improvements of our system. 3 THE BACK-END OF THE SYSTEM
منابع مشابه
Some notes about chords estimation from audio
Here are some notes about my work on chord detection from audio at the University of Tokyo this summer. An acoustic model is first made from audio. It is based on chroma vectors extraction. Language models are then used along with it in order to get the most likely sequence of chords. I have previously been working on probabilistic modeling of chord sequences using the n-gram model, and the pre...
متن کاملA Probabilistic Framework for Audio-Based Tonal Key and Chord Recognition
A unified probabilistic framework for audio-based chord and tonal key recognition is described and evaluated. The proposed framework embodies an acoustic observation likelihood model and key & chord transition models. It is shown how to conceive these models and how to use music theory to link key/chord transition probabilities to perceptual similarities between keys/chords. The advantage of a ...
متن کاملMid-level Features for Audio Chord Estimation using Stacked Denoising Autoencoders
Deep neural networks composed of several pre-trained layers have been successfully applied to various tasks related to audio processing. Stacked denoising autoencoders represent one type of such networks. They are discussed in this paper in application to audio feature extraction for audio chord estimation task. The features obtained from audio spectrogram with the help of autoencoders can be u...
متن کاملMirex Submissions for Audio Chord Detection (no Training) and Structural Segmentation
This paper describes our approach to chord extraction from audio, a variant of which was submitted to the 2009 MIREX Chord Detection Task (No Training), and achieved the top ranking of 71.2%. The structural segmentation algorithm is a pre-processing step for the chord extraction, and was also submitted separately for the Structural Segmentation Task. It also achieved the top ranking in that cat...
متن کاملA Unified System for Chord Transcription and Key Extraction Using Hidden Markov Models
A new approach for acoustic chord transcription and key extraction is presented. We use a novel method of acquiring a large set of labeled training data for automatic key/chord recognition from the raw audio without the enormously laborious process of manual annotation. To this end, we first perform harmonic analysis on symbolic data to extract the key information and the chord labels with prec...
متن کاملUsing Musical Structure to Enhance Automatic Chord Transcription
Chord extraction from audio is a well-established music computing task, and many valid approaches have been presented in recent years that use different chord templates, smoothing techniques and musical context models. The present work shows that additional exploitation of the repetitive structure of songs can enhance chord extraction, by combining chroma information from multiple occurrences o...
متن کامل